{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ur8xi4C7S06n"
   },
   "outputs": [],
   "source": [
    "# Copyright 2019 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HosWdaE-KieL"
   },
   "source": [
    "# **Slicing AutoML Tables Evaluation Results with BigQuery**\n",
    "\n",
    "<table align=\"left\">\n",
    "  <td>\n",
    "    <a href=\"https://colab.sandbox.google.com/github/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/tables/result_slicing/slicing_eval_results.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Colab logo\"> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/tables/result_slicing/slicing_eval_results.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n",
    "      View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "MowcN4adM7eH"
   },
   "source": [
    "## **Overview**\n",
    "This colab assumes that you've created a dataset with AutoML Tables, and used that dataset to train a classification model. Once the model is done training, you also need to export the results table by using the following instructions. You'll see more detailed setup instructions below.\n",
    "\n",
    "This colab will walk you through the process of using BigQuery to visualize data slices, showing you one simple way to evaluate your model for bias.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "uLcF5EMyIDWe"
   },
   "source": [
    "### **Dataset**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ECb9SlJLnajE"
   },
   "source": [
    "\n",
    "You'll need to use the AutoML Tables frontend or service to create a model and export its evaluation results to BigQuery. You should find a link on the Evaluate tab to view your evaluation results in BigQuery once you've finished training your model. Then navigate to BigQuery in your GCP console and you'll see your new results table in the list of tables to which your project has access.\n",
    "\n",
    "For demo purposes, we'll be using the [Default of Credit Card Clients](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) dataset for analysis.\n",
    "\n",
    "**Note:** Although the data we use in this demo is public, you'll need to enter your own Google Cloud project ID in the parameter below to authenticate to it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kXQXf1W8IKPK"
   },
   "source": [
    "### **Objective**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zndbRXq4ne8K"
   },
   "source": [
    "\n",
    "This dataset was collected to help compare different methods of predicting credit card default. Using this colab to analyze your own dataset may require a little adaptation.\n",
    "The code below will sample if you want it to. Or you can set sample_count to be as large or larger than your dataset to use the whole thing for analysis.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "w4YELJp6O_xw"
   },
   "source": [
    "### **Costs**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "74OP8KFwO_gs"
   },
   "source": [
    "This tutorial uses billable components of Google Cloud Platform (GCP):\n",
    "\n",
    "* Cloud AI Platform\n",
    "* BigQuery\n",
    "\n",
    "Learn about [Cloud AI Platform pricing](https://cloud.google.com/ml-engine/docs/pricing), [BigQuery pricing](https://cloud.google.com/bigquery/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ze4-nDLfK4pw"
   },
   "source": [
    "## **Set up your local development environment**\n",
    "\n",
    "**If you are using Colab or AI Platform Notebooks**, your environment already meets\n",
    "all the requirements to run this notebook. If you are using **AI Platform Notebook**, make sure the machine configuration type is **1 vCPU, 3.75 GB RAM** or above and environment as **Python or TensorFlow Enterprise 1.15**. You can skip this step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gCuSR8GkAgzl"
   },
   "source": [
    "**Otherwise**, make sure your environment meets this notebook's requirements.\n",
    "You need the following:\n",
    "\n",
    "* The Google Cloud SDK\n",
    "* Git\n",
    "* Python 3\n",
    "* virtualenv\n",
    "* Jupyter notebook running in a virtual environment with Python 3\n",
    "\n",
    "The Google Cloud guide to [Setting up a Python development\n",
    "environment](https://cloud.google.com/python/setup) and the [Jupyter\n",
    "installation guide](https://jupyter.org/install) provide detailed instructions\n",
    "for meeting these requirements. The following steps provide a condensed set of\n",
    "instructions:\n",
    "\n",
    "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n",
    "\n",
    "2. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n",
    "\n",
    "3. [Install\n",
    "   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n",
    "   and create a virtual environment that uses Python 3.\n",
    "\n",
    "4. Activate that environment and run `pip install jupyter` in a shell to install\n",
    "   Jupyter.\n",
    "\n",
    "5. Run `jupyter notebook` in a shell to launch Jupyter.\n",
    "\n",
    "6. Open this notebook in the Jupyter Notebook Dashboard."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "BF1j6f9HApxa"
   },
   "source": [
    "## **Set up your GCP project**\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "1. [Select or create a GCP project.](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n",
    "\n",
    "2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)\n",
    "\n",
    "3. [Enable the AI Platform APIs and Compute Engine APIs.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "N-nqtnSQRISO"
   },
   "source": [
    "## **PIP Install Packages and dependencies**\n",
    "\n",
    "Install additional dependencies not installed in Notebook environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "wyy5Lbnzg5fi"
   },
   "outputs": [],
   "source": [
    "! pip install --upgrade --quiet --user sklearn\n",
    "! pip install --upgrade --quiet --user witwidget\n",
    "! pip install --upgrade --quiet --user tensorflow==1.15\n",
    "! pip install --upgrade --quiet --user tensorflow_model_analysis\n",
    "! pip install --upgrade --quiet --user pandas-gbq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qjXdLSh9EHu7"
   },
   "source": [
    "Note: Try installing using `sudo`, if the above command throw any permission errors. You can **ignore other errors** and continue to next steps."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8KN_WEoGTMG_"
   },
   "source": [
    "Skip the below cell if you are using Colab.\n",
    "\n",
    "If you are using **AI Notebook Platform > JupyterLab**. Install following packages.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "8UiMePfgTmfe"
   },
   "outputs": [],
   "source": [
    "! sudo jupyter labextension install wit-widget\n",
    "! sudo jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
    "! sudo jupyter labextension install wit-widget@1.3\n",
    "! sudo jupyter labextension install jupyter-matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "w1os-ocbUIpC"
   },
   "source": [
    "Skip the below cell if you are using Colab.\n",
    "\n",
    "If you are using **AI Notebook Platform > Classic Notebook** or **Local Environment**. Install and enable following dependencies to link WitWidget and TFMA with notebook extensions.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "2GurPHG_UfBP"
   },
   "outputs": [],
   "source": [
    "! jupyter nbextension enable --py widgetsnbextension\n",
    "! jupyter nbextension install --py --symlink tensorflow_model_analysis\n",
    "! jupyter nbextension enable --py tensorflow_model_analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kK5JATKPNf3I"
   },
   "source": [
    "**Note:** Try installing using `--user`, if the above command throw any permission errors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "f-YlNVLTYXXN"
   },
   "source": [
    "`Restart` the kernel to allow the libraries to be imported for Jupyter Notebooks.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "C16j_LPrYbZa"
   },
   "outputs": [],
   "source": [
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ekfuBcMyCrfu"
   },
   "source": [
    "`Refresh` the browser for visualization while running in Jupyter Notebooks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "chUbwXRjP2UU"
   },
   "source": [
    "## **Set up your GCP Project Id**\n",
    "\n",
    "Enter your `Project Id` in the cell below. Then run the  cell to make sure the\n",
    "Cloud SDK uses the right project for all the commands in this notebook.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "oM1iC_MfAts1"
   },
   "outputs": [],
   "source": [
    "PROJECT_ID = \"[your-project-id]\" #@param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dr--iN2kAylZ"
   },
   "source": [
    "## **Authenticate your GCP account**\n",
    "\n",
    "**If you are using AI Platform Notebooks**, your environment is already\n",
    "authenticated. Skip this step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3yyVCJHFSEKG"
   },
   "source": [
    "Otherwise, follow these steps:\n",
    "\n",
    "1. In the GCP Console, go to the [**Create service account key**\n",
    "   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n",
    "\n",
    "2. From the **Service account** drop-down list, select **New service account**.\n",
    "\n",
    "3. In the **Service account name** field, enter a name.\n",
    "\n",
    "4. From the **Role** drop-down list, select\n",
    "   **BigQuery > BigQuery User**.\n",
    "\n",
    "5. Click *Create*. A JSON file that contains your key downloads to your\n",
    "local environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Yt6PhVG0UdF1"
   },
   "source": [
    "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "q5TeVHKDMOJF"
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "# Upload the downloaded JSON file that contains your key.\n",
    "if 'google.colab' in sys.modules:    \n",
    "  from google.colab import files\n",
    "  keyfile_upload = files.upload()\n",
    "  keyfile = list(keyfile_upload.keys())[0]\n",
    "  %env GOOGLE_APPLICATION_CREDENTIALS $keyfile\n",
    "  ! gcloud auth activate-service-account --key-file $keyfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "d1bnPeDVMR5Q"
   },
   "source": [
    "***If you are running the notebook locally***, enter the path to your service account key as the `GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "fsVNKXESYoeQ"
   },
   "outputs": [],
   "source": [
    "# If you are running this notebook locally, replace the string below with the\n",
    "# path to your service account key and run this cell to authenticate your GCP\n",
    "# account.\n",
    "\n",
    "%env GOOGLE_APPLICATION_CREDENTIALS /path/to/service/account\n",
    "! gcloud auth activate-service-account --key-file '/path/to/service/account'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "XoEqT2Y4DJmf"
   },
   "source": [
    "## **Import libraries and define constants**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tR6KXS3dJ3sx"
   },
   "source": [
    "Import relevant packages.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "pRUOFELefqf1"
   },
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "LdWSxWQWMm1w"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "import pandas as pd\n",
    "import sys\n",
    "sys.path.append('./python')\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.metrics import accuracy_score, roc_curve, roc_auc_score\n",
    "from sklearn.metrics import precision_recall_curve\n",
    "from collections import OrderedDict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "1GSz7kjjMjVP"
   },
   "outputs": [],
   "source": [
    "# For facets.\n",
    "from IPython.core.display import display, HTML\n",
    "import base64\n",
    "import witwidget.notebook.visualization as visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "cA1vWKC3MeQn"
   },
   "outputs": [],
   "source": [
    "# Tensorflow model analysis\n",
    "import apache_beam as beam\n",
    "import tempfile\n",
    "from google.protobuf import text_format\n",
    "from tensorflow_model_analysis import post_export_metrics\n",
    "from tensorflow_model_analysis import types\n",
    "from tensorflow_model_analysis.api import model_eval_lib\n",
    "from tensorflow_model_analysis.evaluators import aggregate\n",
    "from tensorflow_model_analysis.extractors import slice_key_extractor\n",
    "from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_evaluate_graph\n",
    "from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_extractor\n",
    "from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_predict\n",
    "from tensorflow_model_analysis.proto import metrics_for_slice_pb2\n",
    "from tensorflow_model_analysis import slicer\n",
    "from tensorflow_model_analysis.view.widget_view import render_slicing_metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "TNQ2gDEWMgXC"
   },
   "outputs": [],
   "source": [
    "# Tensorflow versions\n",
    "import tensorflow as tf\n",
    "print('Tensorflow version: {}'.format(tf.__version__))\n",
    "import tensorflow_model_analysis as tfma\n",
    "print('TFMA version: {}'.format(tfma.version.VERSION_STRING))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "mmsqduL8Jhck"
   },
   "source": [
    "Populate the following cell with the necessary constants and run it to initialize constants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "OX05mmN5SNv6"
   },
   "outputs": [],
   "source": [
    "#@title Constants { vertical-output: true }\n",
    "\n",
    "TABLE_NAME = 'bigquery-public-data.ml_datasets.credit_card_default' #@param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "77Km8lS2Kctp"
   },
   "source": [
    "## **Query Dataset**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "L41KlPaPSROt"
   },
   "outputs": [],
   "source": [
    "sample_count = 3000 #@param {type:\"integer\"}\n",
    "\n",
    "row_count = pd.io.gbq.read_gbq('''\n",
    "  SELECT \n",
    "    COUNT(*) as total\n",
    "  FROM `%s`''' % (TABLE_NAME), project_id=PROJECT_ID, verbose=False).total[0]\n",
    "nested_df = pd.io.gbq.read_gbq('''\n",
    "  SELECT\n",
    "    *\n",
    "  FROM\n",
    "    `%s`\n",
    "  WHERE RAND() < %d/%d\n",
    "  ''' % (TABLE_NAME, sample_count, row_count), \n",
    "         project_id=PROJECT_ID, verbose=False)\n",
    "\n",
    "print('Full dataset has %d rows' % row_count)\n",
    "nested_df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "H0FK-2oiKnCE"
   },
   "source": [
    "## **Unnest the columns**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "YJddw6ITNEWj"
   },
   "outputs": [],
   "source": [
    "from collections import OrderedDict\n",
    "import json\n",
    "\n",
    "def unnest_df(nested_df):\n",
    "    rows_list = []\n",
    "    for index, row in nested_df.iterrows():\n",
    "        for i in row[\"predicted_default_payment_next_month\"]:\n",
    "            row_dict = OrderedDict()\n",
    "            row_dict = json.loads(row.to_json())\n",
    "            row_dict[\"predicted_default_payment_next_month_tables_score\"] = i[\"tables\"][\"score\"]\n",
    "            row_dict[\"predicted_default_payment_next_month_tables_value\"] = i[\"tables\"][\"value\"]\n",
    "            rows_list.append(row_dict) \n",
    "\n",
    "    unnested_df = pd.DataFrame(rows_list, columns=list(rows_list[0].keys()))\n",
    "    unnested_df = unnested_df.drop(\n",
    "                  [\"predicted_default_payment_next_month\"], axis=1)\n",
    "    return unnested_df\n",
    "\n",
    "df = unnest_df(nested_df)\n",
    "print(\"Unnested completed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HzR2cIRMSwBt"
   },
   "source": [
    "## **Data Preprocessing**\n",
    "Many of the tools we use to analyze models and data expect to find their inputs in the [tensorflow.Example](https://www.tensorflow.org/tutorials/load_data/tfrecord) format. Here, we'll preprocess our data into tf. Examples, and also extract the predicted class from our classifier, which is binary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "3RpRi-eHSoMD"
   },
   "outputs": [],
   "source": [
    "#@title Columns { vertical-output: true }\n",
    "\n",
    "unique_id_field = 'id' #@param {type: 'string'}\n",
    "prediction_field_score = 'predicted_default_payment_next_month_tables_score'  #@param\n",
    "prediction_field_value = 'predicted_default_payment_next_month_tables_value'  #@param"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "_-IImPuzTiG-"
   },
   "outputs": [],
   "source": [
    "def extract_top_class(prediction_tuples):\n",
    "  # values from Tables show up as a CSV of individual json (prediction, confidence) objects.\n",
    "  best_score = 0\n",
    "  best_class = u''\n",
    "  for val, sco in prediction_tuples:\n",
    "    if sco > best_score:\n",
    "      best_score = sco\n",
    "      best_class = val\n",
    "  return (best_class, best_score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "IMMv_1z2TiKT"
   },
   "outputs": [],
   "source": [
    "def df_to_examples(df, columns=None):\n",
    "  examples = []\n",
    "  if columns == None:\n",
    "    columns = df.columns.values.tolist()\n",
    "  for id in df[unique_id_field].unique():\n",
    "    example = tf.train.Example()\n",
    "    prediction_tuples = zip(\n",
    "        df.loc[df[unique_id_field] == id][prediction_field_value], \n",
    "        df.loc[df[unique_id_field] == id][prediction_field_score])\n",
    "    row = df.loc[df[unique_id_field] == id].iloc[0]\n",
    "    for col in columns:\n",
    "      if col == prediction_field_score or col == prediction_field_value:\n",
    "        # Deal with prediction fields separately.\n",
    "        continue\n",
    "      elif df[col].dtype is np.dtype(np.int64):\n",
    "        example.features.feature[col].int64_list.value.append(int(row[col]))\n",
    "      elif df[col].dtype is np.dtype(np.float64):\n",
    "        example.features.feature[col].float_list.value.append(row[col])\n",
    "      elif row[col] is None:\n",
    "        continue\n",
    "      elif row[col] == row[col]:\n",
    "        example.features.feature[col].bytes_list.value.append(\n",
    "            row[col].encode('utf-8'))\n",
    "    cla, sco = extract_top_class(prediction_tuples)\n",
    "    example.features.feature['predicted_class'].int64_list.value.append(cla)\n",
    "    example.features.feature['predicted_class_score']\\\n",
    "                    .float_list.value.append(sco)\n",
    "    examples.append(example)\n",
    "  return examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "hJPfTH-UTngy"
   },
   "outputs": [],
   "source": [
    "# Fix up some types so analysis is consistent. \n",
    "# This code is specific to the dataset.\n",
    "df = df.astype({\"pay_5\":float, \"pay_6\":float})\n",
    "\n",
    "# Converts a dataframe column into a column of 0's and 1's based on the provided test.\n",
    "def make_label_column_numeric(df, label_column, test):\n",
    "  df[label_column] = np.where(test(df[label_column]), 1, 0)\n",
    "  \n",
    "# Convert label types to numeric. This code is specific to the dataset.\n",
    "make_label_column_numeric(df, \n",
    "                          'predicted_default_payment_next_month_tables_value', \n",
    "                          lambda val: val == '1')\n",
    "make_label_column_numeric(df, 'default_payment_next_month', \n",
    "                               lambda val:  val == '1')\n",
    "\n",
    "examples = df_to_examples(df)\n",
    "print(\"Preprocessing complete!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "sn7Y9In0TwJe"
   },
   "source": [
    "## **What-If Tool**\n",
    "First, we'll explore the data and predictions using the [What-If Tool](https://pair-code.github.io/what-if-tool/). The What-If tool is a powerful visual interface to explore data, models, and predictions. Because we're reading our results from BigQuery, we aren't able to use the features of the What-If Tool that query the model directly. But we can still learn a lot about this dataset from the exploration that the What-If tool enables.\n",
    "\n",
    "Imagine that you're curious to discover whether there's a discrepancy in the predictive power of your model depending on the marital status of the person whose credit history is being analyzed. You can use the What-If Tool to look at a glance and see the relative sizes of the data samples for each class. In this dataset, the marital statuses are encoded as 1 = married; 2 = single; 3 = divorce; 0=others. You can see using the What-If Tool that there are very few samples for classes other than married or single, which might indicate that performance could be compromised. If this lack of representation concerns you, you could consider collecting more data for underrepresented classes, downsampling overrepresented classes, or upweighting underrepresented data types as you train, depending on your use case and data availability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "FXOrJyh5TzQw"
   },
   "outputs": [],
   "source": [
    "#@title WitWidget Configuration { vertical-output: false }\n",
    "\n",
    "WitWidget = visualization.WitWidget\n",
    "WitConfigBuilder = visualization.WitConfigBuilder\n",
    "\n",
    "num_datapoints = 2965  #@param {type: \"number\"}\n",
    "tool_height_in_px = 700  #@param {type: \"number\"}\n",
    "\n",
    "# Setup the tool with the test examples and the trained classifier.\n",
    "config_builder = WitConfigBuilder(examples[:num_datapoints])\n",
    "# Need to call this so we have inference_address and model_name initialized.\n",
    "config_builder = config_builder.set_estimator_and_feature_spec('', '')\n",
    "config_builder = config_builder.set_compare_estimator_and_feature_spec('', '')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Qfmr0cQCGBmu"
   },
   "outputs": [],
   "source": [
    "WitWidget(config_builder, height=tool_height_in_px)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "n3u5yCG2T8zz"
   },
   "source": [
    "## **Tensorflow Model Analysis** \n",
    "Then, let's examine some sliced metrics. This section of the tutorial will use [TFMA](https://github.com/tensorflow/model-analysis) model agnostic analysis capabilities.\n",
    "\n",
    "TFMA generates sliced metrics graphs and confusion matrices. We can use these to dig deeper into the question of how well this model performs on different classes of marital status. The model was built to optimize for AUC ROC metric, and it does fairly well for all of the classes, though there is a small performance gap for the \"divorced\" category. But when we look at the AUC-PR metric slices, we can see that the \"divorced\" and \"other\" classes are very poorly served by the model compared to the more common classes. AUC-PR is the metric that measures how well the tradeoff between precision and recall is being made in the model's predictions. If we're concerned about this gap, we could consider retraining to use AUC-PR as the optimization metric and see whether that model does a better job making equitable predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "WVV0XThVadZM"
   },
   "outputs": [],
   "source": [
    "# To set up model agnostic extraction, need to specify features and labels of\n",
    "# interest in a feature map.\n",
    "feature_map = OrderedDict();\n",
    "\n",
    "for i, column in enumerate(df.columns):\n",
    "  type = df.dtypes[i]\n",
    "  if column == prediction_field_score or column == prediction_field_value:\n",
    "    continue\n",
    "  elif (type == np.dtype(np.float64)):\n",
    "    feature_map[column] =  tf.io.FixedLenFeature([], tf.float32)\n",
    "  elif (type == np.dtype(np.object)):\n",
    "    feature_map[column] =  tf.io.FixedLenFeature([], tf.string)\n",
    "  elif (type == np.dtype(np.int64)):\n",
    "    feature_map[column] = tf.io.FixedLenFeature([], tf.int64)\n",
    "  elif (type == np.dtype(np.bool)):\n",
    "    feature_map[column] = tf.io.FixedLenFeature([], tf.bool)\n",
    "  elif (type == np.dtype(np.datetime64)):\n",
    "    feature_map[column] = tf.io.FixedLenFeature([], tf.timestamp)\n",
    "\n",
    "feature_map['predicted_class'] = tf.io.FixedLenFeature([], tf.int64)\n",
    "feature_map['predicted_class_score'] = tf.io.FixedLenFeature([], tf.float32)\n",
    "\n",
    "serialized_examples = [e.SerializeToString() for e in examples]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "36eU_bZjf0ci"
   },
   "outputs": [],
   "source": [
    "BASE_DIR = tempfile.gettempdir()\n",
    "OUTPUT_DIR = os.path.join(BASE_DIR, 'output')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "aMlNa-UQPg-n"
   },
   "outputs": [],
   "source": [
    "#@title TFMA Inputs { vertical-output: false }\n",
    "\n",
    "slice_column = 'marital_status' #@param {type: 'string'}\n",
    "predicted_labels = 'predicted_class' #@param {type: 'string'}\n",
    "actual_labels = 'default_payment_next_month' #@param {type: 'string'}\n",
    "predicted_class_score = 'predicted_class_score' #@param {type: 'string'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "1avSDsaVPrwb"
   },
   "outputs": [],
   "source": [
    "  with beam.Pipeline() as pipeline:\n",
    "    model_agnostic_config = model_agnostic_predict.ModelAgnosticConfig(\n",
    "              label_keys=[actual_labels],\n",
    "              prediction_keys=[predicted_labels],\n",
    "              feature_spec=feature_map)\n",
    "\n",
    "    extractors = [\n",
    "            model_agnostic_extractor.ModelAgnosticExtractor(\n",
    "                model_agnostic_config=model_agnostic_config,\n",
    "                desired_batch_size=3),\n",
    "              slice_key_extractor.SliceKeyExtractor([\n",
    "                  slicer.SingleSliceSpec(columns=[slice_column])\n",
    "              ])\n",
    "        ]\n",
    "\n",
    "    auc_roc_callback = post_export_metrics.auc(\n",
    "        labels_key=actual_labels,\n",
    "        target_prediction_keys=[predicted_labels])\n",
    "\n",
    "    auc_pr_callback = post_export_metrics.auc(\n",
    "        curve='PR',\n",
    "        labels_key=actual_labels,\n",
    "        target_prediction_keys=[predicted_labels])\n",
    "\n",
    "    confusion_matrix_callback = post_export_metrics\\\n",
    "    .confusion_matrix_at_thresholds(\n",
    "        labels_key=actual_labels,\n",
    "        target_prediction_keys=[predicted_labels],\n",
    "        example_weight_key=predicted_class_score,\n",
    "        thresholds=[0.0, 0.5, 0.8, 1.0])\n",
    "\n",
    "    # Create our model agnostic aggregator.\n",
    "    eval_shared_model = types.EvalSharedModel(\n",
    "        construct_fn=model_agnostic_evaluate_graph.make_construct_fn(\n",
    "            add_metrics_callbacks=[confusion_matrix_callback,\n",
    "                                    auc_roc_callback,\n",
    "                                    auc_pr_callback,\n",
    "                                    post_export_metrics.example_count()],\n",
    "            config=model_agnostic_config))\n",
    "\n",
    "    # Run Model Agnostic Eval.\n",
    "    _ = (\n",
    "        pipeline\n",
    "        | beam.Create(serialized_examples)\n",
    "        | 'ExtractEvaluateAndWriteResults' >>\n",
    "          model_eval_lib.ExtractEvaluateAndWriteResults(\n",
    "              eval_shared_model=eval_shared_model,\n",
    "              output_path=OUTPUT_DIR,\n",
    "              extractors=extractors))\n",
    "\n",
    "eval_result = tfma.load_eval_result(output_path=OUTPUT_DIR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "B0OFjuIbF_jz"
   },
   "outputs": [],
   "source": [
    "render_slicing_metrics(eval_result, slicing_column=slice_column)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "slicing_eval_results.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
